Random projections for large-scale speaker search
نویسندگان
چکیده
This paper describes a system for indexing acoustic feature vectors for large-scale speaker search using random projections. Given one or more target feature vectors, large-scale speaker search enables returning similar vectors (in a nearest-neighbors fashion) in sublinear time. The speaker feature space is comprised of i-vectors, derived from Gaussian Mixture Model supervectors. The index and search algorithm is derived from locality sensitive hashing with novel approaches in neighboring bin approximation for improving the miss rate at specified false alarm thresholds. The distance metric for determining the similarity between vectors is the cosine distance. This approach significantly reduced the search space by 70% with minimal increase in miss rate. When combined with further dimensionality reduction, a reduction of the search space by over 90% is also possible. All experiments are based on the NIST SRE 2010 evaluation.
منابع مشابه
Sign Stable Random Projections for Large-Scale Learning
In this paper, we study the use of “sign α-stable random projections” (where 0 < α ≤ 2) for building basic data processing tools in the context of large-scale machine learning applications (e.g., classification, regression, clustering, and near-neighbor search). After the processing by sign stable random projections, the inner products of the processed data approximate various types of nonlinea...
متن کاملRandom Projections for Anchor-based Topic Inference
Recent spectral topic discovery methods are extremely fast at processing large document corpora, but scale poorly with the size of the input vocabulary. Random projections are vital to ensure speed and limit memory usage. We empirically evaluate several methods for generating random projections and measure the effect of parameters such as sparsity and dimensionality. We find that methods with s...
متن کاملb-Bit Minwise Hashing for Large-Scale Learning
Abstract Minwise hashing is a standard technique in the context of search for efficiently computing set similarities. The recent development of b-bit minwise hashing provides a substantial improvement by storing only the lowest b bits of each hashed value. In this paper, we demonstrate that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and ...
متن کاملHybrid Probabilistic Search Methods for Simulation Optimization
Discrete-event simulation based optimization is the process of finding the optimum design of a stochastic system when the performance measure(s) could only be estimated via simulation. Randomness in simulation outputs often challenges the correct selection of the optimum. We propose an algorithm that merges Ranking and Selection procedures with a large class of random search methods for continu...
متن کاملLPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring
Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...
متن کامل